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Abstract 

In high energy physics experiments (HEP), high speed and 
fault resilient data communication is needed between detec¬ 
tors/sensors and the host PC. Transient faults can occur in 
the communication hardware due to various external effects 
like presence of charged particles, noise in the environment 
or radiation effects in HEP experiments and that leads to sin¬ 
gle/multiple bit error. In order to keep the communication sys¬ 
tem functional in such a radiation environment where direct 
intervention of human is not possible, a high speed data ac¬ 
quisition (DAQ) architecture is necessary which supports error 
recovery. 

This design presents an efficient implementation of field pro¬ 
grammable gate array (FPGA) based high speed DAQ system 
with optical communication link supported by multi-bit error 
correcting model. The design has been implemented on Xilinx 
Kintex-7 board and is tested for board to board communication 
as well as for PC communication using PCI (Peripheral Com¬ 
ponent Interconnect express). Data communication speed up 
to 4.8 Gbps has been achieved in board to board and board to 
PC communication and estimation of resource utilization and 
critical path delay are also measured. 

1 Motivation 

There remains an immense challenge in developing an efficient 
DAQ chain for HEP experiments as there is a demand of high 
data rate, low error and scope for further development of sys¬ 
tem architecture. The DAQ chain in general consists of analog 
sensor hardware followed by analog to digital (A/D) converter 
with high resolution, that gets connected to digital part of the 
DAQ chain for storage and further processing. In our work, we 
have targeted the digital part of the DAQ chain, which commu¬ 
nicates with the host computer for further analysis of data at 
the back end. In general a successful HEP experiment requires 
a DAQ chain to handle the following issues: 


• Data Capacity > 1 Tb/s 

We have considered FPGA for our hardware prototype devel¬ 
opment due to its reconfigurability, which perfectly supports 
an evolving design requirement, as well as due to availability 
of design IP and flexibility of protocol implementation in terms 
of hardware software co-design. Our proposed system provides 
high data rate with transient error correction capability. 

2 Design Requirement 

For the DAQ prototype design described in this paper we have 
used two Xilinx Kintex 7 (KC705) boards, one optical fiber ca¬ 
ble, jitter cleaned clock generator (CDCE62005EVM), power 
supply (220V), one host PC, Xilinx ISE 14.5 software with 
Chipscope Pro Analyzer tools. 

3 System Design 

The complete flow of the system with different functional 
blocks are shown in Figure Q] A detail functional description 
of each of the block and their importance are described in this 
section. In our prototype we have taken 48 bit input along with 

4 bit slow control field as data, which is transmitted over the 
optical link. 

3.1 Scrambler/De-scrambler 

Scrambler is used here to reduce the occurrence of long se¬ 
quences of ‘1’ (or ‘0’) that maintains a good DC balance in 
input signal coming from the A/D converter. It is used to en¬ 
able accurate timing recovery on receiver equipment without 
resorting to redundant line coding. It has a latency of one clock 
cycle but does not add any redundancy in the system unlike 
the 86/106 or 76/86 line coding. De-scrambler just does the 
opposite with respect to the scrambler in the receiver side. 


• # channels >100k 

• Read-out frequency > 100 kHz 

• Synchronization limit < 100 ps 
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3.2 BCH Encoder/Decoder 

BCH is a binary error correcting code m. This functional 
block is used to detect the errors, which occur due to the ra- 


diation (SEU |[3j) or hits by the charged particles during the 
transmission. Here, we have used (15,7,2) BCH code which 
can correct up to two bits error. In this coding scheme 7 bit data 
is appended with 8 redundant bits for error correction. So the 
coding efficiency is 0.467. To increase the coding efficiency, 
we can use (15,11,1) BCH code but in that case error correct¬ 
ing capability will be reduced from two to one. Similarly, we 
can use triple error correcting BCH code H but that will re¬ 
duce coding efficiency. To optimize between coding efficiency 
and error correcting capability we have chosen (15,7,2) BCH 
code. This block increases an extra hardware redundancy and 
clock latency in the whole system that increases reliability in 
the data transfer. For decoding of the coded data we follow 
three steps: 

1. Determination of the error locater polynomial 

2. Detection of error location using Chien Search Algo¬ 
rithm 0 

3. Location of the data at the error position 

3.3 Interleaver/De-interleaver 

Interleaving is the reordering of the data that is to be transmit¬ 
ted, so that the consecutive bytes of data are distributed over a 
larger sequence of data to reduce the effect of burst error. The 
use of interleaving greatly improves the capacity of the code 
to correct burst errors. Normally two kinds of interleaving are 
used in any communication system: 

1. Block interleaver 


transceiver (MGT). The data rate and clock frequency can be 
changed to any value according to the requirement. This block 
is used to synchronize the data rate between MGT and the other 
parts of the design keeping the data rate same. It also reduces 
the bandwidth consumption in the channel. It is used both 
in transmitter and receiver side for synchronization. Figure [3] 
shows the architectural block diagram of the MUX-DEMUX 
and clock domain crossing. 



Mux 120 to 40 bits DeMux 40 to 120 bits 


Figure 3: Mux DeMux for clock domain crossing 

3.5 Serializer/De-serializer 

This block simply converts the parallel data to serial data, 
which is transmitted over the communication channel. It is in¬ 
built within the MGT. De-serializer simply converts the serial 
data to parallel data in the receiver side. 


2. Convolutional interleaver 

Here we have used the block interleaver. This interleaver block 
does not add any extra clock latency in the whole system. The 
whole process increases the code correction capabilities with¬ 
out any extra overhead. Figure [2] shows interleaving process 


a b c 



Figure 2: Interleaving process and effects of burst error 

taking three blocks of data (each block size is 4 bit) into con¬ 
sideration. During the transmission, if any noise disrupts 4 bits 
of data, the errors are distributed in the received data. So there 
is a marginal amount of distortion instead of completely loos¬ 
ing the data in the received block during the burst errors. De¬ 
interleaver process just does the opposite with respect to the 
interleaver in the receiver side. 

3.4 MUX/DEMUX and Clock Domain Crossing 

This block consists of dual port RAM and read write controller. 
It breaks down 120 bit frame into three words of 40 bits width. 
Here, we have used 120 MHz clock to drive the multi-gigabit 


3.6 Frame Aligner and Pattern Search 

The frame aligner block is only used in the receiver side. Data 
may be affected by noise when transmitted through the chan¬ 
nel. Hence, the frame aligner aligns the frames correctly before 
further processing. Every frame has a frame header, which is 
used to detect a frame type properly and is to be searched first 
in the receiver side. Different frame formats of our system are 
shown in top of the Figure [4] The standard frame consists of 
four fields: Header field ( H ) consist of 4 bit data, Slow Con¬ 
trol (SC) field consists of 4 bit data, Data ( D ) field of width 
48 bit. Forward Error Correction field of width 64 bit (FEC). 
SC field is reserved for controlling the DAQ chain in future. 
Whereas in extra wide bus frame first two fields are same but 
there is no such error correction and width of the data field is 
112 bit. So extra wide bus frame consists of three field: Header 
field. Slow control field and data field. Extra wide bus frame 
format will be used for those applications where probability of 
occurrence of error is very less like out side the radiation zone. 
Thus the efficiency of data transfer is higher in extra wide bus 
frame format compromising with the errors. The frame aligner 
and pattern search block consists of two sub blocks (Pattern 
search block. Right shifter block) as shown in bottom of the 
Figure [4] Right shifter block shifts the receive data by one bit 
to the right side from MSB side and send it to pattern search 
block. The pattern search block checks whether the header is 
received or not. Once the header is properly detected pattern 
search block will continuously search for header for another 32 
times and then the search process will be completed and frame 
becomes synchronized. Until the header is properly detected a 
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Figure 1: Internal blocks of the proposed system 


bit slip command will be generated and the searching process 
for the header will be going on. 
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Figure 4: Different frame formats and Frame Alignment pro¬ 
cess 

3.7 Data Transfer to Host PC through PCIe 

An asynchronous Fast In Fast Out (FIFO) and Scatter Gather 
Direct Memory Access (SGDMA) is used to transfer data from 
FPGA board to PC through PCIe. We have used PCIe gen 2 
IP core available from Xilinx. Interconnection of FPGA to PC 
through PCIe is shown in Figure 0 Data will be written in 


to FIFO at a frequency (120 MHz) by which MGT is running 
and data will be read from FIFO at a frequency (125 MHz) 
by which PCIe core is running. In the PC side we capture the 
data by a program had been developed using windows software 
development kit (SDK) written in C language. 

The complete chain of the functional blocks as shown in Fig¬ 
ure Q] for the high speed DAQ with multi-bit error correction 
(here we take up to two bits error correction) has been imple¬ 
mented on the FPGA board. Figure [6] shows the complete se¬ 
rial flow diagram of the generation of a standard frame format. 
At first, only 52 bit user data is scrambled by the scrambler 
block. This 52 bits data is divided into four 13 bits block of 
data and scrambles each block parallely. The scrambled data 
with the 4 bit header is mapped in the input lines of the eight 
BCH (15,7,2) encoder. Here, each BCH encoder block can cor¬ 
rect up to 2 bits of error within 7 bits of input. So the total 
8 x 2 = 16 bits can be corrected using this technique with out 
any extra resources. Output of all the encoders are appended 
to get a frame of 120 bits data. This 120 bits of data is in¬ 
terleaved first and then goes to the next functional block that 
is the MUX. Interleaving (described in section [373] ) is used to 
reduce the effect of burst error. But the header (H) position 
which would not changed in the frame format (red color in Fig¬ 
ure [6} even after interleaving process, helps to synchronize the 
frame in the receiver side. In Mux-DeMux and clock domain 
crossing block a dual port RAM is used to write this 120 bits 
data using 40 MHz clock and read the same data in 120 MHz 
clock rate and 40 bit word size. So the data rate for writing is 
40 x 120 = 4.8 Gbps and for reading is 120 x 40 = 4.8 Gbps 
that are same. The 40 bit data is serialized first and goes to the 
transmitter (TX) for transmitting over the optical fiber cable. 
In the receiver (RX) side functional blocks are De-serializer, 
DEMUX and clock domain crossing. De-interleaver, BCH De¬ 
coder (15, 7, 2) and De-scrambler. They performs just oppo- 
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Figure 5: PCIe interfacing with blocks and Experimental setup of proposed DAQ 


site function with respect to Serializer, MUX, Interleaver, BCH 
Encoder (15, 7, 2) and Scrambler. Only the Frame Aligner and 
Pattern Search blocks are the extra component in the receiver 
side. The detailed functional description of these extra blocks 
are given in the section [3761 For standard frame format gener¬ 
ation, we have used 1010 as header and for the extra wide bus 
frame 0101 is used as header. 


4 System Implementation 

The full prototype of DAQ chain is implemented in the Xilinx 
Kintex-7 boards (KC705 from Avnet) using the Xilinx ISE 14.5 
platform and VHDL language. We have used an external jitter 
cleaned clock source (CDCE62005EVM of TI) to drive MGT 
of two Kintex-7 boards. The timing diagram of the transmitter 
and the receiver side are given in Figure[7| Name of the signals 
and their function are given in Table Q] 

5 Experimental Setup and perfor¬ 
mance 

The block diagram and experimental setup of the system are 
shown Figure |5] We achieved maximum bit rate 4.8 Gbps in 
our system. In standard mode, a frame contains only 52 bits of 
data, 64 bits for error correction (FEC) and 4 bits of header. 64 
bits for FEC can correct up to 16 bits of error, as it is applied 
on 8 encoder blocks in parallel (2 bit error correction for each 
block). In extra wide bus frame format, where error correction 
code is not used, so the frame can carry (52 + 64 = 116) bits 
of data, out of 120 bits frame. So the data rate achieved con¬ 
sidering only the data field (D) in standard mode is: 

40 MHz x 52 bits = 2.08 Gbps 

Similarly, in non-error correctional mode (extra wide bus 
mode), data rate is measured: 

40 MHz x 116 bits = 4.64 Gbps 

So, the data transfer efficiency for the above mention two 
modes are (2.08/4.80) x 100 = 43.33% and (4.64/4.80 = 
96.6% respectively. 

Resource utilization for each functional block of the system 
including critical time delay is given in Table [2] 


Critical time is the maximum delay time, to get the output of 
a circuit block after the input is given. Power consumption is 
calculated using Xilinx Xpower tool and is given in Tabled 


Board 

Module Name 

Logic Power(mW) 

Signal Power(mW) 

Kintex 7-325t 

BCH Encoder( 15,7,2) 

0.02 

0.01 

BCH Decoder( 15,7,2) 

0.05 

0.07 

Scrambler 

0.04 

0.00 

Descrambler 

0.01 

0.00 

Interleaver 

0.01 

0.01 

Deinterleaver 

0.01 

0.02 

Frame Aligner 

1.34 

1.07 

PCIe 

253.24 

45.55 

Top Module 
Without PCIe 

474.18 

2.91 

Top Module 

With PCIe 

304.24 

56.31 


Table 3: Module wise power consumption 


The video link of the real lab setup is given here. 

https://vimeo.com/113255103 


6 Conclusion 

In this work we have proposed a novel DAQ design for HEP 
experiments. The proposed DAQ supports high speed (Gbps) 
optical data communication and also achieves multi-bit error 
correction. The DAQ design has been implemented on Xilinx 
Kintex-7 board and real test setup has been developed involv¬ 
ing board to board communication and PCIe interfacing with 
a host PC. A detailed performance analysis of the DAQ imple¬ 
mentation is presented in terms of timing diagram, resource uti¬ 
lization and critical path delay for of each blocks (FPGA) and 
power consumption. The proposed DAQ design and its imple¬ 
mentation involving optical data communication and multi-bit 
error correction capability can be considered as first of its kind 
and can serve as a benchmark design in HEP DAQ. 
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Signal Name 

Function 

Use 

Fabric Clk 

Use to drive different blocks in DAQ 

Use both in Transmission and Receiver side 

MGTREF Clk 

Use to drive MGT 

Use both in Transmission and Receiver side 

PLL Clk 

Use to drive MGT 

Use both in Transmission and Receiver side 

PLL Locked 

Output of PLL. It indicates PLL generate stable clock 

Use both in Transmission and Receiver side 

RESET 

Use to reset the whole system 

Use both in transmission and Receiver side 

BUSY 0 

Becomes high when System enters a process before ready 

Use both in transmission and Receiver side 

Scrambler 

This signal contains the data of output of scrambler block 

Use in Transmission side only 

Encoder 

This signal contains the data after BCH encoding 

Use in Transmission side only 

MUX Output 

This signal contains output of MUX block which is 40 bit width 

Use in Transmission side only 

FRAME ALIGNR RIGHTSHIFT 

store the receive data after shifting right by one bit 

Use in Receiver side only 

FRAME ALIGNR PATTERN SEARCH 

Check whether header is matched or not 

Use in Receiver side only 

FRAME ALIGNR B SCOUNTER 

Store the output of counter until header is not matched 

Use in Receiver side only 

Header LOCK 0 

Becomes high when the frame is locked 

Use in Receiver side only 

FRAME ALIGNR WrAddr 

store the address of RAM where receive data will be written 

Use in Receiver side only 

RAM„ENABLE 

Becomes high when RAM is Ready to perform 

Use in Receiver side only 

Write 40 bit Data 

Store 40 bit data which is to be written in RAM 

Use in Receiver side only 

DECODER ENABLE 

Becomes high when BCH decoder is ready to perform 

Use in Receiver side only 

DECODER 

Contains the decoded data 

Use in Receiver side only 

DESCRAMBLER 

Contains output data of descrambler block 

Use in Receiver side only 


Table 1: Description of the signals used in timing diagram 
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Board 

Module Name 

Slice Register 

Slice LUTs 

LUT-Flip Flop 

BRAM 

Critical path (ns) 

Kintex 7-325t 

BCH Encoder(15,7,2) 

7//407600 

951/203800 

0 

7/951 

0.373 

BCH Decoder(15,7,2) 

135/407600 

446/203800 

0 

119/462(25%) 

0.985 

Scrambler 

52 

53 

5 

0 

0.786 

Descrambler 

104 

56 

5 

0 

0.689 

Interleaver 

44 

40 

40 

0 

0.905 

Deinterleaver 

201 

82 

80 

0 

0.634 

Frame Aligner 

115 

308 

72 

0 

2.294 

PCIe 

5882 

5287 

2694 

10 

3.875 

Top Module 
Without PCIe 

3665 

9003 

1998 

5 

8.62 

Top Module 

With PCIe 

8360 

8555 

3779 

26 

11.455 


Table 2: Resource Utilization 
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